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§ ' Abstract 

^^O , An effective way to increase the noise robustness of automatic speech recognition 

is to label noisy speech features as either reliable or unreliable (missing) prior to 
decoding, and to replace the missing ones by clean speech estimates. We present 
a novel method to obtain such clean speech estimates. Unlike previous imputation 
frameworks which work on a frame-by-frame basis, our method focuses on ex- 
ploiting information from a large time-context. Using a sliding window approach, 



on 

. denoised speech representations are constructed using a sparse representation of 

the reliable features in an overcomplete basis of fixed-length exemplar fragments. 
We demonstrate the potential of our approach with experiments on the AURORA-2 
connected digit database. 



1 Introduction 
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Automatic speech recognition (ASR) performance degrades substantially when speech is corrupted 
■ by background noise that was not seen during training. Missing Data Techniques (MDTs) [TJ |2l 

| provide a powerful way to mitigate the impact of both stationary and non-stationary noise for a wide 

range of Signal-to-Noise (SNR) ratios. The general idea behind MDT is that it is possible to estimate 
—prior to decoding— which spectro-temporal elements of the acoustic representations are reliable 
(i.e., dominated by speech) and which are unreliable (i.e., dominated by background noise). These 
. reliability estimates, referred to as a spectro graphic mask, are used to treat reliable and unreliable 

' features differently. The mask information can for instance be used to replace the unreliable features 

by clean speech estimates (e.g., (3j|4l|5l) which is called imputation. 

Although, admittedly, impressive gains in recognition accuracy have been achieved using MDTs, at 
SNRs < OdB the performance is often too poor for practical applications. A possible explanation 
for the problems at low SNRs is the fact that most missing data imputation methods work on a 
frame-by-frame basis (i.e. strictly local in time). However, at SNRs < dB a substantial number of 
frames may contain few, if any, reliable features. Therefore, there is an increased risk that individual 
frames do not contain sufficient information for successful imputation. 

In (6), we showed that this data scarcity problem at very low SNRs can be solved by a missing data 
imputation method that uses a time window which is (much) wider than a single frame. This allows a 
better exploitation of the redundancy of the speech signal. The technique, sparse imputation, works 
by finding a sparse representation of the reliable features of an unknown word in an overcomplete 
basis of noise-free example words. The projection of these sparse representations in the basis is 
then used to provide clean speech estimates to replace the unreliable features. Since the imputation 
framework introduced in (6) represents each word by a fixed-length vector, its applicability is limited 
to situations where the word boundaries are known beforehand, such as in isolated word recognition. 

In the current paper we extend sparse imputation for use in continuous speech recognition. Rather 
than imputing whole words using a basis of exemplar words, we impute fixed-length sliding time 
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windows using a basis with examples of fixed-length fragments of clean speech. Our goal is to 
establish to what extent this approach leads to better recognition accuracies at SNRs < dB com- 
pared to conventional ASR methods. The technique might bring practical applications within reach 
that are substantially less vulnerable to noise. We evaluate our novel approach by comparing its 
performance with that of a state-of-the-art frame-based imputation approach, using the AURORA-2 
continuous digit recognition task J71- First, we give an upper bound on the performance of both 
techniques by using 'oracle' masksQ. Then we proceed to using an estimated harmonicity mask fH. 

The rest of the paper is organized as follows. In Section [2] we briefly describe MDT In Section [3] 
we introduce the sparse imputation framework. In Section [4] we extend this framework for use in 
continuous ASR. In Section[5]we compare recognition accuracies with the baseline decoder and we 
give our conclusions in Section|6] We conclude with a description of future work. 



2 Missing Data Techniques 

In ASR, speech representations are typically based on some spectro-temporal distribution of acous- 
tic power, called a spectrogram. In noise-free conditions, the value of each element in this two- 
dimensional matrix is determined by the speech signal only. In noisy conditions, the acoustic power 
in each cell may also (in part) be due to background noise. Assuming the noise is additive the spec- 
trogram of noisy speech, denoted by Y, can be described as the sum of the individual spectrograms 
of clean speech S and noise N, i.e., Y = S + N. Elements of Y that predominantly contain speech or 
noise energy are distinguished by introducing a spectrographic mask. With all spectrograms repre- 
sented as K x T dimensional matrices (K being the number of frequency bands and T the number 
of time frames), a mask is defined as an equally sized matrix. Its elements are either 1, meaning the 
corresponding cell of Y is dominated by speech ('reliable') or 0, meaning it is dominated by noise 
('unreliable' c.q. 'missing'). Thus, we write: 

m i u a -I 1 d = reliable if S(k,t) > N(k,t) m 

M{K,t) — < de j_. (I) 

[ = unreliable otherwise 

with frequency band k (1 < k < K) and time frame t (1 < t < T). Then, if the power spectrum of 
the noisy speech is represented on a log-compressed scale, we may write for reliable features: 



log[y (k, t)] = log[5(fc, t) ■ (1 + N(k, t)/S(k, t))] « log[S(k, t)] (2) 

In other words, under the assumption of additive background noise, reliable noisy speech coefficients 
can be used directly as estimates of the clean speech features. 

In experiments with artificially added noise, the mask can be computed using knowledge about the 
corrupting noise and the clean speech signal, the so-called oracle masks. In realistic situations, 
however, the masks must be estimated. Many different estimation techniques have been proposed, 
such as SNR based estimators |9], methods that focus on speech characteristics, e.g. harmonicity 
based SNR estimation (8) and mask estimation by means of Bayesian classifiers IflOl . We refer 
the reader to iPTO and the references therein for a more complete overview of mask estimation 
techniques. In Section |5]we will use one of these masks (i.e. the harmonicity mask J8j) to illustrate 
the properties of our method in combination with an estimated mask. 

Techniques for ASR with missing data can be divided into imputation and marginalization. With 
marginalization [2] missing values are ignored during the decoding by integrating over their possible 
ranges. With imputation |3 | missing features are replaced by estimates (expected values extracted 
from the training set). In this paper we will only consider imputation. Imputation may be viewed as 
a data cleaning technique, enabling the use of conventional ASR systems that perform recognition 
as if all features were reliable. Imputation techniques may also be integrated in an ASR engine as 
illustrated by a successful approach called conditioned imputation J4). The latter approach, which 
we will use in Section [5] to compare our new method against, makes the clean speech estimates 

'Oracle masks are masks in which reliability decisions are based on a priori knowledge, not available in 
practical settings, about the extent to which each time-frequency cell is dominated by either noise or speech. 
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dependent on the hypothesized state of the hidden Markov model. Furthermore, it imposes the 
additional constraint that the power of the clean speech estimates must not exceed the observed 
noisy speech power. 

3 Imputation using sparse representations 

3.1 Sparse representation of speech signals 

We express the K x T spectrogram matrix of noisy speech Fas a single vector y of dimension 
D = K ■ T by concatenating T subsequent time frames. For the moment, we assume T to be fixed, 
which in practice means we have to time-normalize all utterances we want to process. As in J6), we 
consider y to be a non-negative linear combination of exemplar spectrograms a n , where n denotes 
a specific exemplar (1 < n < Na) in the set of Na available exemplars. We write: 

N A 

y = ^ x n a n = Ax (3) 

n=l 

with weights x n > G M, x an A^-dimensional weight vector, and A = (ai a-z . . . ajv— i a at) 
a matrix with dimensionality D x N. 

Typically, the number of exemplar spectrograms will be much larger than the dimensionality of 
the acoustic representation ( Na 3> D). Therefore, the system of linear equations has no unique 
solution. Research in the field of compressed sensing lfl2l[T3l has shown however that if x is sparse, 
a; can be determined exactly by solving: 

min{ ||x||o } subject to y =Ax (4) 

X 

with || . ||o the 1° zero norm (i.e., the number of nonzero elements). 

3.2 I 1 minimization 

The combinatorial problem in Eq.|4]is NP-hard and therefore cannot be solved in practical applica- 
tions. However, it has been proven that, with mild conditions on the sparsity of x and the structure 
of A, x can be determined fBl by solving: 

min{ ||a;||i } subject to y =Ax (5) 
This convex minimization problem can be cast as a least squares problem with an I 1 penalty: 

miniWAx-yy + XWx^} (6) 

X 

with a regularization parameter A and a non-negativity constraint on x. If x, with sparsity 
/ = \\x\\o, is very sparse, Eq. |6]can be solved efficiently in 0(f 3 + Na) time using homotopy 
methods lfl5l . 



3.3 Sparse imputation 

To distinguish between reliable and unreliable features in y we do not solve Eq.|6]directly, but carry 
out a weighted norm minimization instead: 

min{||WAa;- Wy|| 2 + A||a;||i} (7) 

X 

with W a diagonal matrix of which the elements are determined directly by the binary missing data 
mask M and are either or 1. By concatenating subsequent time frames of M, similarly as we 
did for the spectrogram Y, we construct a vector m to represent the weights on the diagonal of 
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W: diag(VK) = m. Thus, we effectively use Was a row selector picking only those rows of A and 
y that are assumed to contain reliable data. 

As suggested in lfl6l it is possible to use the sparse representation x obtained from solving Eq.|7]to 
estimate the missing values of y by reconstruction: 

y=Ax (8) 

y is obtained by a linear combination of corresponding elements of the basis vectors, the weights of 
which were determined using only reliable data. Hence, a version of y that is reshaped into a K x T 
matrix can be considered a denoised spectrogram of the underlying speech signal. 

3.4 Theoretical bounds on successful imputation 

Obviously, no restoration is possible if y does not contain any reliable coefficients at all. In practice, 
a minimum number of reliable coefficients will be required for successful restoration of y. While 
theoretical bounds exist (cf. |[T2l[T3l[T6l ) these are not of great practical value because they depend 
both on the structure of WA and the sparsity of x. Unfortunately, WA changes from utterance to 
utterance as the environmental noise changes. Furthermore, the bounds are NP-hard to establish. 
Hence, we always perform sparse imputation except when no reliable features are present at all, 
thus accepting the risk of a flawed restoration. 

4 Generalization to time-continuous imputation 

The approach described in Sec. [3] is suited for speech units and exemplars that can be adequately 
represented by an equal number of time frames T, e.g. in isolated word recognition fl§). However, 
this does not make sense for arbitrary length utterances and can therefore not be applied to contin- 
uous speech recognition. In this section we extend the sparse imputation framework for use with 
speech signals of arbitrary length by using a sliding, fixed-length time window. Robustness against 
windows with few or no reliable features is provided by using overlapping windows. 

4.1 Time-shifted imputation 

We divide an utterance y of T frames in a series of overlapping time-windows of R frames and 
perform imputation for every individual window with the method described in Section [3] As il- 
lustrated in Figure Q] imputation of feature values that belong to overlapping windows is done by 
averaging the imputed feature values in the individual windows. As before, the basis A is formed by 
Na exemplar vectors, which are reshaped versions of spectrograms (spanning R frames). With the 
spectrogram dimensions being K x R, the vectors have size L = K ■ R, and the dimensions of A 
are L x Na- 

The number of windows / needed for processing the entire speech signal y of dimension D = K -T 
is given by / = (ceil(Z? — L)/A) + 1, with A the window shift expressed as the number of rows 
in y over which the window is shifted. A is a multiple of K because y is a vector of concatenated 
frames, each with K coefficients. We denote the row indices of W and y that correspond to the 
coefficients in the i th window by r (with both i and r representing natural numbers and 1 < i < / 
and iA < t < iA + L). At the beginning of the utterance there will be L of such rows; in the final 
window, this number reduces from L to D — I ■ A (cf. Figure [TJ. For every window we compute a 
sparse representation x as follows: 

mmiWWrAx-Wry.h + XWx^} (9) 

x 

The imputed spectrogram for that window, which we will denote by 7, is computed as 7 = Ax. 
The use of overlapping imputation windows results in multiple imputation candidates . As depicted 
in Fig. [1] we have chosen to compute the final clean speech estimate of the d th component of y, 
denoted by yd, as the average of all imputation candidates resulting from overlapping windows. 
The number of imputation candidates ranges from 1 (at the beginning and end of an utterance) to 
ceil (X/ A). 
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D=K.T 



Imputed value for an unreliable coefficient at position a 1 is computed by 
averaging over all imputation candidates from overlapping windows 



Figure 1 : Schematic diagram of imputation using overlapping windows 



5 Experiments 

To compare the recognition accuracies obtained with the sparse imputation method with those of a 
conventional, frame based MDT approach, we use a continuous digit recognition task. First, we de- 
termine the maximum achievable recognition accuracy for both methods when a priori information 
is provided about speech and noise in the form of an oracle mask. Second, we study the behaviour 
of the new imputation method using an estimated mask. 

Recognition performance through sparse imputation may be affected by three parameters: the basis 
size Na, the window size R, and the window shift A. In this paper, we keep Na and R fixed and 
first investigate how recognition accuracy varies with window shift. In the remaining experiments 
we study the differences in recognition performance between the sparse imputation method and a 
conventional frame-based recognizer in more detail using only the best scoring window-shift. 



5.1 Experimental setup 

The speech material used for evaluation is taken from test set 'A' of the AURORA-2 corpus Q. 
The utterances contain one to seven digits, artificially mixed with four different types of noise, viz. 
subway, car, babble, exhibition hall. We evaluate recognition accuracy as a function of SNR at 
the four lowest SNR levels present in the corpus, viz. 10, 5, 0, and — 5dB. The results we report 
are averages over the four noise conditions. The spectrographic representations of the noise N 
and clean speech S are available independently. To reduce computation times, we used a random, 
representative subset of 10% of the utterances (i.e. ps 400 utterances per SNR level). 

The exemplar spectrograms in the basis matrix A were created by extraction of spectrogram frag- 
ments of randomly selected utterances in the clean train set of AURORA-2, using a random offset. 
The length of the exemplars was chosen R = 35 frames, which equals the mean number of frames 
of a single digit |6J. Thus, the exemplars typically represent sequences of parts of digits. A pilot 
study with basis sizes ranging from Na = 4000 to Na = 14000 revealed that recognition accuracy 
did not increase with Na > 8000. We therefore use a basis size Na = 8000 throughout this paper. 
The window shifts experimented with are 1, 5, 10, 15, 20, 25, 30, and 35 frames. 

For the baseline system, we used the state-of-the-art missing data recognition system described in 
l4l [T71 . Acoustic feature vectors consisted of Mel frequency log power spectra (K = 23 bands). 
Unreliable features are replaced by estimated values using maximum likelihood per Gaussian based 
imputation [4]. The acoustic representations obtained with our sparse imputation method were pro- 
cessed by the baseline system using a spectrographic mask that considers every time-frequency cell 
as reliable (thus performing no additional missing data imputation). 

We use two different masks to describe the reliability of time-frequency cells: 1) an ora- 
cle mask and 2) an estimated mask in the form of a harmonicity mask (8). The imputa- 
tion method was implemented in MATLAB. The I 1 minimization was carried out using the 
SolveLasso solver implemented as part of the SparseLab toolbox which can be obtained from 
www. sparselab. stanford.edu. 
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Figure 2: Word recognition accuracy as a function of window shift. The left pane shows results for 
the oracle mask and the right pane for the harmonicity mask. Window shift is expressed in frames. 



5.2 Results and discussion 

5.2.1 Speech recognition accuracy as a function of window shift 

Figures [2a] (for the oracle mask) and [2b] (for the harmonicity mask) show recognition accuracy as 
a function of the window shift in frames. Both figures show that recognition accuracy steadily 
decreases as the window shift increases. Moreover, for the oracle mask the performance at low 
SNRs decreases faster for larger window shifts. This is most likely due to the number of windows 
with few or no reliable features: the larger the window shift, the fewer overlapping windows there 
are. As a consequence, the number of windows containing insufficient reliable features for succesful 
data restoration increases. The results show that the best results are obtained using a window shift 
of one frame, corresponding to A = K. This shift will be used in the remainder of this paper. 



5.2.2 Comparison with baseline decoder: oracle mask 

When used with an oracle mask, the sparse imputation achieves much higher recognition accuracies 
than the baseline (cf. filled circles in Figure [3]). In contrast with the 56% recognition accuracy 
obtained by the baseline decoder at SNR= —5 dB, 86% is a major improvement. While one should 
be aware that these results constitute an upper bound on recognition accuracy, it is promising to 
observe that the unreliable features can be reconstructed so well, even at very low SNRs, provided 
the reliable features can be identified correctly. 

The improvement of 30% over the baseline decoder is similar to the improvement of 31% reported 
in 16) for isolated digit recognition. This corroborates the potential of sparse imputation for ASR: 
To our knowledge this is the first missing data technique that successfully exploits information from 
larger time-windows and can be combined with conventional continuous speech decoding. 



5.2.3 Comparison of accuracies using oracle versus harmonicity mask 

The results obtained with the estimated harmonicity mask are depicted by diamonds in Figure [3] 
Clearly, the recognition accuracies are much lower than with the oracle mask, suggesting that the 
harmonicity mask does not succeed in identifying all reliable coefficients as such. Indeed, Fig. |4] 
shows that the percentage of features that is labeled reliable, is substantially lower than in the oracle 
mask. Yet, the lower recognition accuracies cannot solely be attributed to the reduced number of 
reliable features. For example, consider the recognition accuracy with sparse imputation for the 
harmonicity mask at SNR = 5dB. The number of reliable features is roughly equal to that of the 
oracle mask at SNR = — 5dB, while the recognition accuracy is much lower (65% vs. 86%), 
indicating that the reliable features of the harmonicity mask lack crucial information. However, the 
fact that the sparse imputation accuracies are lower than those of the baseline also indicates that the 
current implementation of sparse imputation does not use all information that is available. 
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Figure 3: Word recognition accuracy for both 
the baseline decoder and the sparse imputation 
method using the oracle mask, the harmonicity 
mask and the corrected harmonicity mask re- 
spectively. The window shift is one frame. 



Figure 4: Percentage of time-frequency cells 
classified as reliable in the oracle mask and the 
harmonicity mask. Additionally, the percent- 
age of false reliables in the harmonicity mask 
is shown. 



5.2.4 Comparison with baseline decoder: harmonicity mask 

It is conceivable that the sparse imputation method is more sensitive to false reliables, i.e., features 
labeled reliable by the harmonicity mask while in fact being unreliable (cf. Fig. @). In order to 
test this hypothesis we performed recognition using a corrected version of the harmonicity mask 
without false reliables. The asterisks in Fig. [3] illustrate that sparse imputation now performs better 
than baseline at SNR= -5dB, comparable at SNR= OdB and worse at SNRs> OdB. Also, the 
overall increase in recognition accuracy is much larger for the sparse imputation framework than for 
the baseline decoder, confirming the method is indeed more sensitive for false reliables. 

The fact remains, however, that also with the false reliables of the harmonicity mask removed, there 
still remains a substantial performance gap compared to oracle performance. This indicates that 
not only the number of reliable features is important for correct imputation, but also their location 
in the time-frequency plane. Apparently, the extra features labeled reliable by the oracle mask in 
comparison to the harmonicity mask contain information that is crucial for a correct imputation: 
The success of finding a sparse representation depends on the exact structure of WA, as described 
in Section [3~4l In comparison with the baseline method, the current implementation of the sparse 
imputation technique too often finds an incorrect imputation result. 

When using estimated masks, as opposed to oracle masks, apparently more attention is needed for 
the constraints that determine the sparse solution of Eq. [7] The sparse imputation technique, unlike 
the baseline decoder, does not take into account that clean speech estimates are bounded by the ob- 
servation energy. Adding this as an additional constraint to the minimization in Eq.|7]might improve 
the success of finding the correct imputation. Another way to further constrain the minimization 
would be to increase the window length R. The chosen window length R = 35, the mean length of 
the digit, implies that many speech examples contain only parts of digits. Larger windows provide 
both more contextual information and increase the dimensionality of the minimization problem. In- 
formal pilot tests suggest larger window sizes improve performance, but a systematic investigation 
of this aspect is left as future research. 



6 Conclusions 

We introduced a new method for imputation of missing data in continuous speech recognition. It 
replaces noise-corrupted features in a sliding window by clean speech estimates which are computed 
using a sparse representation of the reliable features in an overcomplete basis of exemplar speech 
fragments. Imputation results from overlapping windows are combined by averaging. 
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The sparse imputation approach was shown to vastly outperform a classical frame-based approach 
at low SNRs (accuracy of 86% vs. 56% at SNR= — 5dB) when tested on a continuous digit recog- 
nition task and using an oracle mask. Furthermore, we showed that overlapping windows increase 
robustness against windows that coincidentally yield a wrong imputation result. 

Using estimated masks, we were not able to achieve similar impressive improvements as with oracle 
masks. Clearly, when the reliabity estimates cannot be guaranteed to be correct, the current imple- 
mentation easily yields erroneous imputations and additional measures are needed to avoid those. 
We suggested several ways to introduce additional constraints when searching the optimal sparse 
representation of reliable features. Finding out which of these are the most effective is the subject 
of future research. 

Future work 

Future work will focus on refining the framework in several ways to improve performance: 

• Recognition accuracy might benefit from a larger window size since this would provide extra 
constraints when finding a sparse representation. 

• Analogously to our baseline, sparse imputation might also profit from bounded imputation. This 
would require an additional cost function in Eq.|7]that enforces (1 — W)Aa; < (1 — W)y. 

• Research |[T8l has shown that it is beneficial to substitute the hard decision in a binary mask by 
the probability that a certain feature is unreliable. The weighting matrix Win Eq.|9]supports the 
use of such 'fuzzy' masks without further adaptations to the framework. 

Acknowledgments 

The research of Jort Gemmeke was carried out in the MIDAS project, granted under the Dutch- 
Flemish STEVIN program. The project partners are the universities of Leuven, Nijmegen and the 
company Nuance. We aknowledge usefull discussions with Lou Boves. 

References 

[1] B. Raj, R. Singh, and R. Stern, "Inference of missing spectrographic features for robust automatic speech 
recognition," in Proc. International Conference on Spoken Language Processing, 1998, pp. 1491-1494. 

[2] M. Cooke, R Green, L. losifovksi, and A. Vizinho, "Robust automatic speech recognition with missing 
and unreliable acoustic data," Speech Communication, vol. 34, pp. 267-285, 2001. 

[3] B. Raj, "Reconstruction of incomplete spectrograms for robust speech recognition," Ph.D. dissertation, 
Carnegie Mellon University, 2000. 

[4] H. Van hamme, "Prospect features and their application to missing data techniques for robust speech 
recognition," in Proc. 1NTERSPEECH-2004, 2004, pp. 101-104. 

[5] L. losifovski, M. Cooke, P. Green, and A. Vizinho, "State based imputation of missing data for robust 
speech recognition and speech enhancement," in Proc. of Eurospeech, 1999. 

[6] AAnonymous, "Using sparse representations for missing data imputation in noise robust speech recog- 
nition," To appear in Proc. ofEUSIPCO 2008, 2008. 

[7] H. Hirsch and D. Pearce, "The aurora experimental framework for the performance evaluation of speech 
recognition systems under noisy conditions," in Proc. oflSCA ASR2000 Workshop, Paris, France, 2000, 
pp. 181-188. 

[8] H. Van hamme, "Robust speech recognition using cepstral domain missing data techniques and noisy 

masks," in Proc. of IEEE ICASSP, vol. 1, 2004, pp. 213-216. 
[9] A. Vizinho, P. Green, M. Cooke, and L. Josifovski, "Missing data theory, spectral subtraction and signal- 
to-noise estimation for robust asr: An integrated study," in Proc. of Eurospeech, 1999, pp. 2407-2410. 

[10] W. Kim and R. M. Stern, "Band-independent mask estimation for missing-feature reconstruction in the 
presence of unknown background noise," in Proc. of IEEE ICASSP, 2006. 

[11] C. Cerisara, S. Demange, and J.-P. Haton, "On noise masking for automatic missing data speech recogni- 
tion: A survey and discussion," Comput. Speech Lang., vol. 21, no. 3, pp. 443^157, 2007. 

[12] D. L. Donoho, "Compressed sensing," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 
1289-1306, 2006. 

[13] E. I. Candes, "Compressive sampling," in Proc. of the International Congress of Mathematicians, 2006. 
[14] D. L. Donoho, "For most large underdetermined systems of linear equations the minimal 11-norm solution 

is also the sparsest solution," Communications on Pure and Applied Mathematics, vol. 59, no. 6, pp. 797- 

829, 2006. 

[15] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least angle regression," Annals of Statistics, vol. 32, 

no. 2, pp. 407-499, 2004. 
[16] Y. Zhang, "When is missing data recoverable?" Technical Report, 2006. 



8 



[17] H. Van hamme, "Handling time-derivative features in a missing data framework for robust automatic 

speech recognition," in Proc. of IEEE ICASSP, 2006. 
[18] J. Barker, L. Josifovski, M. Cooke, and P. Green, "Soft decisions in missing data techniques for robust 

automatic speech recognition," 2000, pp. 373-376. 



9 



